MIKE: An Interactive Microblogging Keyword Extractor using Contextual Semantic Smoothing
نویسندگان
چکیده
Social media, such as tweets on Twitter and Short Message Service (SMS) messages on cellular networks, are short-length textual documents (short texts or microblog posts) exchanged among users on the Web and/or their mobile devices. Automatic keyword extraction from short texts can be applied in online applications such as tag recommendation and contextual advertising. In this paper we present MIKE, a robust interactive system for keyword extraction from single microblog posts, which uses contextual semantic smoothing; a novel technique that considers term usage patterns in similar texts to improve term relevance information. We incorporate Phi coefficient in our technique, which is based on corpus-based term-to-term relatedness information and successfully handles the shortlength challenge of short texts. Our experiments, conducted on multi-lingual SMS messages and English Twitter tweets, show that MIKE significantly improves keyword extraction performance beyond that achieved by Term Frequency, Inverse Document Frequency (TFIDF). MIKE also integrates a rule-based vocabulary standardizer for multi-lingual short texts which independently improves keyword extraction performance by 14%.
منابع مشابه
Semiautomatic Image Retrieval Using the High Level Semantic Labels
Content-based image retrieval and text-based image retrieval are two fundamental approaches in the field of image retrieval. The challenges related to each of these approaches, guide the researchers to use combining approaches and semi-automatic retrieval using the user interaction in the retrieval cycle. Hence, in this paper, an image retrieval system is introduced that provided two kind of qu...
متن کاملPhraserate: an Html Keyphrase Extractor *
A standard feature in cataloging documents is the list of keywords. When the source documents are web pages, we can attempt to aid the cataloger by analyzing the page and presenting relevant support material. Since the keywords that occur in a document generally occur in keyphrases, and keyphrases provide contextual material for reviewing candidate keywords, they are a natural aggregate to extr...
متن کاملAutomatic Hashtag Recommendation in Social Networking and Microblogging Platforms Using a Knowledge-Intensive Content-based Approach
In social networking/microblogging environments, #tag is often used for categorizing messages and marking their key points. Also, since some social networks such as twitter apply restrictions on the number of characters in messages, #tags can serve as a useful tool for helping users express their messages. In this paper, a new knowledge-intensive content-based #tag recommendation system is intr...
متن کاملWord Sense Disambiguation in Contextual Dynamic Network Using Associative Concept Dictionary
Many of the Japanese ideographs (Chinese characters) have a few meanings. They should be disambiguated by using their contextual information. Some of the ideographs have a few different pronunciations depending on their meanings. For example, we have an ideograph which has two pronunciations, /hitai/ and /gaku/, the former means a forehead of the human body and the latter means an amount of mon...
متن کاملWhat can NLP techniques do for eLearning?
The aim of the Language Technology for eLearning project is to show is to show that current results achieved in the area of Natural Language Processing and the Semantic Web, (i.e. ontologies) can play a relevant role in improving the functionality of existing Learning Management Systems (LMS). In this paper, we discuss how current NLP techniques have been employed for the development of a keywo...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012